Add sid gr model with validation on amzn beauty dataset #265

JacoCheung · 2026-01-07T08:46:47Z

Description

The related issue: #240
In this PR, a new model is added: semantic id based decoder.
Some highlights:

We use mcore standard transformer block as the backbone of our decoder.However, our jagged input and beam search require special masks, as a result, we have to create padded dense attention mask and invoke the local impl of SPDA of mcore.
We implemented a simple beam search module, which is called during evaluation.
The model is not SOTA impl, and the convergence needs to be confirmed. Only amazon beauty dataset is tested.
We now only support single GPU

Checklist

Code Cleaning
README
CI pipeline ( integration test)

~~CI:~~
CI

shijieliu · 2026-01-08T07:26:13Z

@JacoCheung and how about we move all the ops and modules under commons?

examples/sid_gr/README.md

examples/sid_gr/datasets/__init__.py

…t single batch, add hitrate

…tests

examples/sid_gr/README.md

examples/sid_gr/modules/eval_metrics.py

examples/sid_gr/tests/test_beam_search.py

JacoCheung requested review from shijieliu and z52527 January 7, 2026 08:46

JacoCheung changed the title ~~Add sid gr model with validation on amzn beauty dataset~~ [Draft] Add sid gr model with validation on amzn beauty dataset Jan 7, 2026

shijieliu reviewed Jan 8, 2026

View reviewed changes

examples/sid_gr/README.md Show resolved Hide resolved

shijieliu reviewed Jan 8, 2026

View reviewed changes

examples/sid_gr/datasets/__init__.py Show resolved Hide resolved

JacoCheung added 24 commits January 9, 2026 03:38

Move pipeline to commons

a8f55ba

Move jagged concat ops and embedding to common

0250e5b

Move ShardedEmbeddingConfig to commons

64ce0e5

Add sid gr model definition

4b0cfca

Move distributed to commons

5541234

Move triton_ops.common to commons

bbc4429

Add runnable GPTSID GR model

c86417d

Add training pipeline and random dataset trainable

97a4b94

Restore mypy check

265d5a2

Add disk dataset/dataloader and its utest

a46a445

Separate history_seqlen and candidate_seqlen

acea303

Fix the dataset and emb args feat name mismatch

7c54cd1

Sort samples by userid for debugging

2bb9090

Enable arbitrary mask with local attention impl

4e4118e

Add beam search functionality

1f814ea

Add beam search individual module and eval metric test

a7a0316

Add beam history sids check and eval metrics to gptmodel

05a498b

Support dynamic beam and handle case when topk > num_candidates

6a05e79

Fix bos split bug and add more hist info in bEamSearch

580c37f

Fix mask def for mcore and mask construction error, make model overfi…

070e938

…t single batch, add hitrate

Add RMSNorm

260e986

Fix attention mask definition and enable loss on history

b9ffa87

Add incomplete batch dataset test

59ff4c7

Fix incomplete eval batch

d4a93e4

JacoCheung added 10 commits January 9, 2026 03:38

Fix generation mask

240644d

Fix config for debugging eval

7b80791

Enable single shared lm_head or individual lm_head across hierarchies

8a9e3a3

Add license header

1af0f04

Add sid_gr README

482b277

Fix all utests of sid gr and update sid_amazn config

caa9ed9

Adjust the img size of sid ReadMe

7ceeba1

Lessen the max_train_iters

6d55440

Rename data -> datasets

eae52ab

Rename hstu/dataset -> hstu/datasets and fix commons import error in …

01c39f7

…tests

JacoCheung force-pushed the junzhang/fea_sid_gr_gpt branch from 0392315 to 01c39f7 Compare January 9, 2026 03:39

JacoCheung requested a review from geoffreyQiu January 9, 2026 03:40

JacoCheung changed the title ~~[Draft] Add sid gr model with validation on amzn beauty dataset~~ Add sid gr model with validation on amzn beauty dataset Jan 9, 2026

Restore HKV commit

74b1a0c

shijieliu reviewed Jan 9, 2026

View reviewed changes

examples/sid_gr/README.md Show resolved Hide resolved

examples/sid_gr/modules/eval_metrics.py Show resolved Hide resolved

examples/sid_gr/tests/test_beam_search.py Show resolved Hide resolved

Remove sid/ops and move training link to head of README

6007d51

shijieliu approved these changes Jan 12, 2026

View reviewed changes

shijieliu merged commit 0ff7357 into NVIDIA:main Jan 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add sid gr model with validation on amzn beauty dataset #265

Add sid gr model with validation on amzn beauty dataset #265

Uh oh!

JacoCheung commented Jan 7, 2026 •

edited

Loading

Uh oh!

shijieliu commented Jan 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add sid gr model with validation on amzn beauty dataset #265

Add sid gr model with validation on amzn beauty dataset #265

Uh oh!

Conversation

JacoCheung commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

shijieliu commented Jan 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JacoCheung commented Jan 7, 2026 •

edited

Loading